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Abstract 


We develop a Bayesian framework for estimation and prediction of dynamic models for observations 
from the two-parameter exponential family. Different link functions are introduced to model both the 
mean and the precision in the exponential family allowing the introduction of covariates and time series 
components. We explore conjugacy and analytical approximations under the class of partial specified 


models to keep the computation fast. The algorithm of West et al. (19851 is extended to cope with the two- 


parameter exponential family models. The methodological novelties are illustrated with two applications 
to real data. The first, considers unemployment rates in Brazil and the second some macroeconomic 
variables for the United Kingdom. 


1 Introduction 


Generalized linear models (GLMs) have become a standard class of models in the data analyst’s 


toolbox. Proposed by Nelder and Wedderburn (19721, GLMs are widely used in many areas of knowledge. 
They allow modelling data of many different natures via the probabilistic description as an element of 
the exponential family and relating the response mean and the linear predictor in a non-linear form. The 
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GLM class is a useful alternative for data analysis since it accommodates skewness and heteroskedasticity, 
besides allowing analysis using the data in their original scale. The evolution of these models as well as 
details regarding inference, fitting, model checking, etc., is documented in the seminal book of|McCullagh| 


and Nelder (19891 and many others in the recent literature. 


The main criticism of the use of the one-parameter exponential family in certain applications is that 
samples are often found to be too heterogeneous to be explained by a one-parameter family of models 
in the sense that the implicit mean-variance relationship in such a family is not supported by the data. 
To overcome this limitation Gelfand and Dalai (|1990 ) and Dey et al. (1997) introduced the class of two- 


parameter exponential family, which includes the ones presented by Efron (19861 and Lindsay (1986) as 
special cases. They argue that the introduction of a second parameter allows taking into account the 
over-dispersion usually present in the data, an issue that has been recognized by data analysts for many 
years. 

During the 1990s, special attention was devoted to modelling the mean and the variance 
simultaneously. Taguchi type methods led to some efforts to jointly model the mean and the dispersion 
from designed experiments, avoiding the data transformation that is usually necessary to satisfy the 


assumptions of traditional linear models Nelder and Lee (20011. The process of quality improvement 
aims to minimize the product variation caused by different types of noise. Quality improvement must 
be implemented in the design stage via experiments to assess the sensitivity of different control factors 


that affect the variability and mean of the process. Nelder and Lee (20011 discussed how the main ideas 
of a GLM can be extended to analyse Taguchi’s experiments. From a static point of view, the Bayesian 
inference for this class of models is fully discussed in the papers previously cited, while some alternative 


aspects of MCMC are discussed in Gepeda and Gamerman (2005) and Gepeda et al. (2011). 


Our aim in this article is to extend the class of models introduced by Gelfand and Dalai (1990) 
and Dey et al. ( |1997 ) to deal with time series data and to propose a fast algorithm for estimation and 
prediction of this class of models. To reach this objective we propose an algorithm based on analytical 
approximations, for example, based on Laplace approximations. This way we are extending the conjugate 


updating method proposed in West et al. (1985). 

The remainder of the manuscript is organized as follows. Section introduces the class of models 
we are focused on. In Section the conjugate updating of West et ah] ( |1985 ) is extended to the two- 
parameter exponential family. Section illustrates the proposed method with two case studies: the first 
one models unemployment rates in Brazil and the second one models some data on the UK economy as 
beta distributed data. Section concludes with a discussion and possible future research directions. 
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2 Extended Dynamic Generalized Linear Models 


In this section we introduced the class of extended dynamic generalized linear models (EDGLM). 
First we briefly revise the two-parameter exponential family and the dynamic generalized linear models, 
mainly aiming to fix the notation to be used in this paper. A special parametrization of the two- 
parameter exponential family is presented in this section. It is very useful to deal with data analysis 
when heterogeneity in the sample is greater than that explained by the variance function in the one- 
parameter exponential family. The distributions in this family are often used in many applications in the 
current literature, not only to deal with the topic of extra variability. 

The two-parameter exponential family has the form 


p{y\SA) =a{y)eyip{4>[9di{y) + d 2 {y)]- ,y & T c IR 


( 1 ) 


where a(-) is a non-negative function, di(-) and d 2 {‘) are known real functions, (0, </>) G 0 x $ C IR X IR+ 
and exp{—p(6*, (/))} = J a{y) exp {(f>[9di{y) + d 2 {y)]} dy < oo. This is a suitable reparameterization of the 
general two-parameter exponential family as defined in Bernardo and Smith (1994). 

This class includes many continuous distributions, such as the normal with unknown mean and 
variance, the inverse Gaussian and the beta distributions, parameterized by its mean and precision 


factor. The expression for the variances, as we will see in section 3.3 make clear the relevance of the 
precision parameter, cj), to control the model variance. Large values of (/) corresponds to more precise data 
or equivalently with smaller variance. Some discrete distributions are also included in this class, such as 
the binomial (with the sample size known) and Poisson distributions, taking the scale parameter as fixed 
and equal to one. 

Among other interesting features of this class of distributions, we stress the existence of a joint prior 
distribution for the parameters {9,4>) in the form p{9,(j)\T) = k{t) exp {(j)[9Ti + T 2 ] — TQp{9,(j))} , where 
T = (To,ri,T 2 )' and K('r)“^ = J J exp{4>[9Ti -|-T 2 ] — Top{9,(j))} d9dcj). Let ip = {9, (jj) £ ^ = & x 
to make the notation easier. Its prior mode and observed curvature matrix can be straightforwardly 
obtained differentiating the expression above with respect to the parameters vector ip. More specifically, 
the mode and curvature matrix satisfy the equations 


d 

Ip = aigmax ■—log{p{ip\T)) and J{ip) = 
i/> oip 

Then it follows, after some algebra, that 


92 


dip' dip 


log(p(t/>|x)). 


Xi -'^o-^p{ip) 

9ti +T2 -TQ—p{lp) 



and J{'ip) = 


d^ 


-rot^pi-ip) 


d"^ 
'^1 - 


n - To 


d9d(p 

92 
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The predictive distribution is also defined in closed form, as 

p{y\T) = a{y) € T, where t* = (tq + 1, n + di(y), ra + d 2 (?/))'. (2) 

Now that the basic notation is clearly stated, we can progress to the dynamic version of the extended 
generalized linear model. Let yi, • • • , yr be conditionally independent observations from the two- 
parameter exponential family and for each t G {1, denote E[yt\^p^] = pt. Let us suppose that both 

the mean pt and the precision (jjt can be described by explanatory variables through possibly different 
non-linear link functions, denoted by gi and ya- 

Therefore, given the prior moments of the latent states (3^, the class of models to be considered 
in this paper is described by three components. The first is a conditional conjugate model describing 
observations in the two-parameter exponential family with its prior distribution; 

ytl'j/’t ~ ^/(ytlV’t) and CEf{Tt), Vt=l,---,T, (3) 

where Ef{jjt\xjj^) denotes a distribution in the two-parameter exponential family 0: CEf{Tt) represents 
its conjugate prior distribution and Dt-i denotes all the information available up to time t — 1. 

A general link function is introduced to relate the linear predictors with the mean and precision of 
the observational distribution evaluated as functions of '0: 

Vt = aitPt) = and /3( = -H a;*, ~ [0, Wt], (4) 

with g : MxIR''’ —^ IR^, g{'il^t) = (5i(Mt)) Ft is apx2 matrix, wherep = pi-l-p 2 , withpi = dim 

and /3(j = {Pui, ■ • • , Pupi)', * = 1,2, the latent variables vector related to /it and (j)t. Depending on the 
specification of Ft a broad class of models can be entertained. If Ft = diag(Ffi, Fat), different time series 
components and covariates are used to describe the time evolution of gt and (pt through the link functions. 
Of course, they can also share some common regressors. The state parameters’ evolution is described 
by a partially specified distribution, with uit [0, Wt], where [a, b] denotes a distribution specified just 
by its first and second moments. The state parameters’ initial information, /3 q|I1o ~ [mo,Co], is also 
partially specified with prior moments ttiq and Cq ■ 

Therefore equations in ([^ and Q, together with the state parameters’ initial information, define a 
class of partially specified models, where only the first and second prior moments are defined for the 
vector of latent components. 
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3 Inference in EDGLM 


The class of models described by (§ and|^ extends the models treated in |West et al.| ( |19^ not only 
allowing the scale parameter to vary in time, but also modelling it through an additional link function. 
This extension implies that the original algorithm is not immediately applicable. 


The conjugate updating algorithm of West et al. (1985) is extended, making estimation in this class of 


models feasible. The estimation is still based in the conjugate distribution and linear Bayes estimation, 
updating sequentially the state vector distributions at each time t, as in the original algorithm. At the 
end of this process, we obtain both the first and second posterior moments of latent states vectors and 
the posterior distribution of {ip^\Dt) for each instant t. 

In the next subsections, we review the main steps involved in the conjugate updating algorithm 
mainly to set up the notation, and propose a strategy to reduce the system dimension. We also discuss 
the forecasting distribution and conclude with some examples. 


3.1 Extended Conjugate Updating 

The conjugate updating algorithm is based on the steps: evolution, moments equating and updating. 
The evolution step involves obtaining the first and second moments of the state vectors prior distribution, 
at := E[f3t\Dt-i] = Gtrrit-i and Rt := Var{f3t\Dt-x) = GtCt-iG't + Wt, given the posterior mean 
and variance at time t — 1, mt-\, Gt-i and the state evolution variance Wf The prior moments for the 
linear predictors follow immediately as: ft := E[r]t\Dt-i] = F'tat and Qt ■= Var{T]t\Dt-i) = FtRtFt- 

Alternatively, the prior moments of the linear predictor, rjt = g{'tpt)^ can be obtained as functions 
of parameters defining the conjugate prior. Denote these prior moments as: E[ri^\Dt-i\ = h{Tt) and 
Var{rit\Dt-i) = H{Tt), where h : —>■ H : IR^ —^ Ai and Ad is a set of symmetric positive 

definite 2x2 matrices and t = (To,ri,T 2 )' is the parameters vector of the conjugate prior. 


We are facing a similar problem to the one posed by Poole and Raftery (2000) in the context of 


computer simulation models. There are two prior on the same quantity but based on different sources 
of information. This also occurs in the context of reaching consensus in the presence of multiple expert 
opinions. The analytic expressions of the above moments need to be equated to the linear predictors’ 
numerical moments, previously obtained as functions of the prior moments of the states, providing the 
non-linear system of equations: 

h{Tt) = ft and vec{H{Tt)) = vec{Qt). (5) 

where vec(Al) denotes the vectorization of the upper triangular matrix of a symmetric M.. 

Note that the dimension of the involved vectors and matrices leads to a non-linear system with more 
equations than unknown quantities, so the system (§ does not provide a unique solution for the parameter 


5 











vector Tt- Therefore it is necessary to introduce some criterion to reduce this large set of solutions to 
one compromise solution. A proposal to deal with this sort of dimension incompatibility in system © 
is treated in Section |3.2| This aims to answer the following query: What is the “best conjugate prior 
distribution” corresponding to the partially specified predictive distribution with mean and variance 


QJ 

After observing a new datum, the prior parameters are straightforwardly updated. It follows from 
conjugacy that Tt can be updated according to expressions in ([^, giving a new parameter vector r^. 
The linear predictors’ posterior moments can be obtained analogously to the system equations ([^, given 
f*=h{T’l) and = iT(r(), or analogously, vec(Qj') = vec(JT(Tj')). 


The observed information is propagated to the state vector using linear Bayes estimation (West 


and Harrison (1997), Chapter 4), since its distribution is only partially specified. Then, we obtain the 
posterior moments of /3(, rrit = E[f3t\Dt] = at + RtFtQt~^{fl — ft) and Ct = Var{(3t\Dt) = 
Rt + RtFtQi\Ql-Qt]Qf^F'tRt. The smoothed posterior moments of the latent states can be obtained 


in the same way as rrit and Ct, using linear Bayes estimation, as detailed in Souza (2013), resulting to 

>-i 


expressions, rrit = E[f3t\DT] = m* + CtGt+iRt+i{mt_^_i - Ot+i) 


and 


Ct = VariOtlDr) = 


CtC' 


.t^t+i^t+iiC^t+i - Rt+i)Rt+iGt+iCt, where mf, := my and C^ := Cj 


3.2 Dimensionality Reduction 

To ensure the uniqueness of the vector Tt at each time considered in the algorithm, we need to reduce 
the dimensionality of the system ([^. Several possibilities can be explored for this reduction, including 
arbitrary solutions such as ignoring some equations of the system (§. To avoid such arbitrariness we 
propose an alternative inspired on the generalized method of moments ( |Yin (2009)). Our main objective is 
to match the linear predictors’ moments and the conjugate prior moments preserving as much information 
provided by the system as possible. An optimum solution is obtained by minimizing the quadratic 
distance between the functional form that represents the difference between the numerical moments and 
the moment conditions described by its parameter vector, weighted by a weights matrix $7^ (where k is 
the dimension of the system) and zero. So, an optimum choice for the parameter vector Tt is the one 
that minimizes the function 


Afc(Tt; ft, QtY Ofc Afc(xt; ft, Qt), ( 6 ) 

where I^k{Tt', f t,Qt) = { ft~ ) vec(Q() — vec(iT(xt)) ) is a vectorial function and flk a positive 

definite weight matrix that specifies the importance of each equation condition in the estimation process. 

Actually, since the weight matrix 11determines how each condition is weighted in the system solution, 
a simple choice is to take 11^ = R (identity matrix of dimension k), which corresponds to considering all 
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the equations in system ([^ on equal footing. Of course other choices for the matrix can be considered. 
Intuitively, the more accurate equations should be weighted more than the less accurate ones. A two-stage 


iterative procedure, described in Yin (20091, can be implemented to determine the “optimal” Ctk taking 
into account the observed data. 

In summary, the proposed procedure can be implemented following the algorithm below: 


Extended Conjugate Updating Algorithm: 

At each time t 

Step 1. evolution: given mt_i and Ct-i, 

at = Gtmt_i and RtGtCt-iGt + Wt 
ft = and Qt = F^RtFf 

Step 2. prior moment equating: obtain the prior parameter 
vector Tj, solution of 

argmin{Afc(Tt;/t,Q()' flk Afc(xt;/(, QJ} . 

Tt 

Step 3. posterior moments updating and equating: obtain 
using equation ([^ and calculate ft and Qt using rf in equations 

Step 4. state updating: obtain {mt,Gt) via Linear Bayes 
estimation taking 

mt = at + RtFtQt~^{f*t - ft) and 

Gt = Rt + RtFtQi^[Q*t~Qt]Qi^F[Rt. 


3.3 Some Illustrative Examples 

In this section we present examples involving the normal, the inverse Gaussian and the gamma 
distribution, leaving the discussion of the beta model to the next section. Our aim is to show the 
main functions involved in the Ef definition and their constraints. 


3.3.1 Normal distribution with unknown mean and precision 

Consider model ([^ , where | /i*, ) represents the density function of normal distribution with mean 

Ht = 9t and variance In this case, di{yt) = yt, d^iyt) = -y and p{9t,4>t) = {pi4>t - log(<^t)), 
and the conjugate prior distribution takes the form 

p{lJ,t,(l>t\Dt-i) oc exp{(l)t[ptTit+T 2 t]-TotpiiJ.t,(l>t)]}, Mt G H, e IR’^ 
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which represents the kernel of the density function of the normal-gamma distribution with parameters 

Tit Tot + 1 Tf^ 

—, Tot, —^— and - -- T 2 t. 

Tot ^ ^TQt 

Using the natural link functions iju = gl{^lt) = g-t and 772 * = 52 (</'t) = log((/'t), and the crude 
approximation of the digamma function, tp^x) = log{x) + 0(x), x > xq, to evaluate the moments of the 
linear predictor r/ 2 t, it follows that moment conditions are represented as in the functional form 


^k{T t, ft, Qt) ~ 


fit- — , f2t - log 
, Tot 


( Tot + 1 

T ' 

1 2 

[ 2rot "‘J 


gilt 


'It 


2t, 


T2t 


Ot 


Tot^ 


Tot + 1 


1 -1 


- 1 


<?22i 


Tot + 1 


(7) 


Therefore Tt is obtained as the solution that minimizes the associated quadratic form. 

Note that in this example the prior covariance of the linear predictors at each time t, are 

zero, which indicates that is a diagonal matrix. In fact, it means that rji is orthogonal to 772 given Dt-i, 
so the system reduces to four equations. Nevertheless solving system 0 is not a trivial minimization 
problem since we need to ensure that all involved moments are well defined, in the sense that at each 
algorithm’s iteration. Tot, Tit and T 2 t generate non-negative variances. In this particular example, the 

T-2 

minimization with respect to the vector Tt must satisfy the restrictions rot > 1 and T 2 t < — 7 ^—, assuming 

2rot 

that the first and second moments of expression 0 are well defined. 


3.3.2 Inverse Gaussian distribution 


Suppose that p{yt\gt,4>t) represents the density function of inverse normal distribution with mean pt 

^3 0 , 

and variance in model (31. It is very ease to show that this model is a member of the exponential 


I 


family, taking di{yt) = -yt, d 2 {yt) = -x—, p{pt,4>t) = - 

2?/t 

this case, the conjugate prior distribution for the observational model is 


— + l log((/'t) 

Pt 2 


and a{y) = {2ny^) In 


p{y,t,(l>t\Dt-i) oc exp<-(j)t 


1 I 

-^Tit + -T2t 


2p. 


T0tp{pt,(l>t) 


pt > 0, (pt > 0. 


( 8 ) 


As explained in Banerjee and Bhattacharyya (1979), conditional to (pt, follows a normal 
distribution truncated at zero; and, conditional to pt, <pt follows a gamma distribution. On the other 
hand, p{pt,(pt\Dt-i) does not have an analytically known form, as far as we know, so we approximate 
its mean and variance by the mode (/it, </)*)' and the inverse curvature matrix Vj of the conjugate prior 
distribution ^ evaluated at the mode point, respectively, getting 


dt 


Tit hot 


Tot 


T2t 


‘ot 

Tit 


and 14 = 


-A- 

Tot(pt 

0 


0 


hi 

Tot 


























Using the link functions gi{fit) = log(/it) and g2{4't) = log(0t), and taking first-order Taylor 
approximations of these functions around {jit, (jit)', we obtain the mode and curvature of the linear 
predictors. 

Then to equate the numerical moments of the linear predictors with those obtained using their 
conjugate prior we must solve the system of equations 


= (fit- log(Tit) -k log(Tot) , f 2 t - log 


TotTlt 


TltT2t - 'Tot 


gilt 


TltT2t 


‘Ot 


Tot 


q22t 


2 

Tot 


(9) 


The optimization problem (7), based on like in (9), must satisfy the constraints T 2 t > — with 

U U '^1* 

Tot , Ut > 0 in order to ensure that all variances are positive. 


Tot 


3.3.3 Gamma distribution 

Let yt\gt,4>t denote the density function of the gamma distribution, with mean gt and variance 
^2 .... . . 1 
—. The quantities defining this member of the two-parameter exponential family are: Ot = —, 

gt 


di{yt) = -yt,d 2 (]j) = log(?/t) and pi9t,'pt) = \og(T{(j)t)) - log ) and, therefore, its conjugate 
prior distribution is given by 


p{yt,4>t\Dt-i) oc explipt 


1 

- Tit + T2t 


- Totp{Ot, 4>t) >, /it > 0, > 0. 


( 10 ) 


Since the prior distribution does not represent a known distribution, as far as we know, we opt to use its 


mode and the inverse curvature matrix of the conjugate prior distribution (10) in place of its mean and 
variance. Using the logarithmic link functions for both parameters, we get 


E{r]it\Dt-i]) ^\og{Tit)-\og{Tot) and U(?72i|A-i]) « log 


Tot 


Tot 


log (^) 


T2t 


Uar(77it|A-i) «rotlog(—)- 
Tot I \ Tot 


T2t 


and Var{r] 2 t\Dt-i) « —. 

Tot 


( 11 ) 


Moreover, taking a first order Taylor approximation of the function g{pt,4't) = (log(Mt) log(<('t)) around 


the mode of (101, we obtain the covariance of the linear predictors as 

Cov{git,p 2 t\Dt-i) « log(/it)log(0i) - [log(/it)][log(0i)] = 0- 

Comparing the numerical moments obtained for linear predictors through the dynamic model with 
those obtained by conjugation (expressions ([TI|), we obtain the functional form 


Afc(Tt; ft, Qt) = ^ /it - log , f 2 t - log ^- 


Tot 


T-Otlog(^) -T2t 


1 2 , , , 

, gilt- f- , q 22 t - I , (12) 

Tottpt '^ot 
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whose quadratic distance with respect to zero (possibly weighted by a weights matrix can be 

T2t 

minimized by imposing the constraints Tot > 0, Tit > 0 and — > /it, which ensures that the moments 

Tot 


up to second order associated with the conjugate prior distribution (101 are well defined. 


3.4 Forecasting 

Assume that our interest is to forecast some future observation, for example, at instant t + h (for 
some integer h), based on all observations until instant t. Making use of exponential family’s proprieties, 
it follows from conjugacy that 

K{Tt+h) 


Piyt+h\Dt) = aiyt+h)- 


(13) 


«+h) ’ 

where K{Tt+h) and are the normalization constants involved in the definition of the prior 

and the posterior distribution of the vector {9t+hT 4‘t+h)^ respectively. Here, the parameter vector 


Tt+h = (To.i+ft,, Ti^t+h, T 2 ^t+h) can be obtained analogously to that discussed in Section 3.2 by solving the 
optimization problem 


arg min {Ak{Tt+h; Qt(/i))'rjfcAfc(Tt+/,; ft{h),Qt{h))} , 

t + h 

given the recursive relation between the linear predictor moments 

ftih) = 

QtW = F'^_^_l^Rt{h)Ft+h, with 


(14) 


at{h) = Gt+hat{h-^), 

at(0) = mt, 


Rt{h) = Gt+hRt{h-l)G'^+n + Wt+h 

RM = Ct. 


and 


Note that the vector 


= ( 


0,t+h^ ' l,t+h> '2,t+h 


) is directly obtained like in the relations represented 


in ([^. 

In cases in which the constants k(-) do not have known analytical form, we must use some numerical 
integration method to approximate them. In this work, Laplace approximations are used to solve such 


integrals. All methods are implemented with the aid of routines available in the free software R (Team 


(2011)), like the optimization function nlminb and the function fdHess which numerically approximate 
gradient and Hessian functions. Furthermore, to improve the quality of the approaches, we use a new 
parameterization for the involved prior distributions in terms of their linear predictors rju and ry 2 t, 
integrating new parameters along the real line. See the next section for an example. 
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4 Case Studies 


In this section, two applications are presented to illustrate the performance of the proposed method. 
In both cases we suppose that the observations follow a beta distribution. The first one models 
unemployment rates in Brazil, using a data set that presents a trend component and a stable seasonal 
pattern. Our main interest in this first application is to illustrate the importance of dynamically modelling 
the precision parameter. The second one considers some macroeconomic variables of the United Kingdom, 
viewed as compositional data. In this example our aim is to show the importance of modelling the data 
in their original scale. To start this section, we show the main developments concerning the beta model 
used in both applications. The implementations are carried out through the R software and more details 
is discussed below. 


4.1 Dynamic beta model components 


Consider now p{yt\l^t, 4>t) as the density function of a beta distribution in model ([^, parameterized in 

terms of its mean pt and its variance ——■—In this case, using conjugacy in the exponential family, 

9 

p(/rt,<^t|A-i) cx exp{(j)t[iJ,tTit+T 2 t]+Totp{p,t,(l>t)} , 0 < Pt < (l)t > 0- (15) 


where p{pt,4>t) = - log 


|. Taking gi{pt) = logit(^t) and g2{4>t) = ^og{4>t) and 


approximating first and second moments of (|l5[), respectively, by the mode and the inverse 


curvature matrix evaluated at the mode, we get 


E{git\Dt-i) « logit(/it) = — and Var{git\Dt-i) 

Tot 


I 


E[T] 2 t\Dt-i] « log((^t) = log - 


Tot 


and Var{g 2 t\Dt-i 


V2{rot log(I - pt) - T 2 t}, 

The functional form Q to be minimized depends on the vector function 


TotPti^ - Pt)'Pt 
2 

Tot ' 


(16) 


A.(r.; J., O.) = I - g , - log 


qiit — 


q22t -, (17) 

Tot 


TQtpt{l — Pt)ci>t 

whose minimum must be obtained by imposing restrictions tq* > 0 and T 2 t > —rotlog(l + exp{/it}), 
since we are imposing the condition that Cov{riit,ri 2 t\Dt-i) = 0, such as in the gamma case. 

It is worth noting that although the beta distribution has a conjugated prior represented in equation 


(151, it does not have a known analytical form, as far as we know. So, to find its normalization constant 


we need to approximate the integral 

pOO pi 

K{T0t,Tlt,T2t)~^ = / exp{(j)t[ptTlt+T2t]+Totp{pt,(t>t)}dpt4>t, 

Jo Jo 
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by using a Laplace approximation for its expression. In fact, by changing the variables of the integral in 
(18) in terms of rjn and rj 2 t, we can approximate it as 


K('rot,'rit,T'2t)’ 


poo poo r 


1 + 


Tit + T2t 


- Totp'{mt,V2t) ? dr]itr]2t 


V^\Vt\^^ exp{L((77it,7?2t)}, 


where p'{r]it,V2t) = log (r(e’'^*)) - log ( L ( e' 


cV2t 


1+e’iit Tit + T2t 


1 + e’)i‘ 

+ Totp'{pit,P2t) and Vt = - [V^Lt(77it,?72t)] 


re 


1 

1 + e’71* 

{■nit,V2t)=(,fht,fi2t) 


is the Hessian matrix 


of Lt{pit,r] 2 t), applied in its mode (iyit, mo¬ 
using the R software, the mode (? 7 it,? 72 t) and the Hessian matrix Vt can be easily obtained using, 
respectively, the functions nlminb and fdHess, using the expression Lt{pit,'ri 2 t) as the argument. 


4.2 Unemployment rates in Brazil 

The data for this example was collected by the Brazilian Institute of Geography and Statistics 
(IBGE: http://www.ibge.gov.br/) through its Monthly Employment Survey and deals with monthly 
unemployment rates of working-age people in the major metropolitan regions of Brazil, namely the 
metropolitan areas of Recife, Salvador, Belo Horizonte, Rio de Janeiro, Sao Paulo and Porto Alegre. The 
monthly unemployment rates of working-age people from March 2002 to December 2011, in a total of 
118 observations, can be seen in Figure This time series clearly exhibits components of trend and 
seasonality. 



month 


Figure 1: Unemployment rates of working-age people in the major metropolitan regions of Brazil from 
March 2002 to December 2011. 


It is well known that the yearly seasonal behaviour in this time series is mainly due to temporary jobs 


created by holiday seasons and school vacations, as mentioned by da Silva et al. (20111. Gonsidering these 
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factors, we analysed the data set through a dynamic beta model, where the observational mean evolve as 


a second-order polynomial model with seasonal effect. Unlike da Silva et al. (20111, we assume a more 


parsimonious model, where seasonality is represented by a one-harmonic model and we assume that the 
precisions can evolve dynamically in time. Additionally, we assume that the latent variables associated 
with means and precisions evolve in time independently, taking the matrices Gt, Wt and Cq as block 
diagonal matrices of the form F^ = diag(Fi(, = diag(Git, G 2 t), Wt = diag(TUit, W 2 t) and 

Go = diag(Gio, G 20 ), where the matrices related to the dynamics of the observational means are given 
by 


Fi* = (1,0,1,0)' and Gu = 


J2(l) 0 

0 J2(l,a;) 


Vt, 


where 


J2(l) = 


(' 

u 

and J2(l,a;) = 

1 cos(a;) 

sin(a;) \ 

\ 0 

1 ) 

' 1 

1 — sin(a;) 

cos(a;) / 


27r 

12 ' 


To model the dispersions, we assume a first order dynamic model, taking F 2 t = G 2 t = 1, Vt, in order 
to allow precision parameter </>( to vary in time through the introduction of a random error. 

We chose to specify the error evolution covariance matrices, Wt, t G through the use 

of multiple discount factors assuming to be a block diagonal matrix whose blocks are associated 
with mean level and trend and seasonal components, and a precision level component, taking D = 
hlockdia,g{S~^i(^l 2 ,S~^J'^l 2 ,S^y'^}, where and are discount factors associated with the 

respective blocks of components by replacing the expression of Rt in the evolution step of the algorithm 
with the form Rt = DGtGt-iG'tD. 

Different combinations of discount factors were tried and we selected the one that provided the 
best performance according to some alternative model selection criteria like the mean squared error 
(MSE)based on one-step-ahead forecasting, the joint log-likelihood (LL) and the log-observed predictive 
density (LPD), excluding the first 18 observations, taken as a learning period. Using the selected discount 
factors, namely, 5^^it = 0.90, = 0.95 and = 0.90, we obtained the model parameter estimates and 

the one-step-ahead predictive distributions for the unemployment rates during the period from September 


2003 to December 2011 at each instant, using expression (13) as discussed in the previous subsection with 
the aid of the R routines nlminb and fdHess. 

In Figure it is possible to observe the filtered {E[/3t\Dt-i]) and the smoothed estimated state 
variable means {E[f3t\DT]) related to the observational mean components, describing level, trend and 
seasonality, respectively; and the state variable associated with the observational precision. In fact, 
there is a clearly decreasing trend in the data as well as a seasonal behaviour like observed in Figure 
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Regarding the precision structure, the small growth of the state variable over time can indicate that 
as new information is incorporated in the estimation process, the accuracy of the model increases. 



month 



month 


(a) level 


(b) trend 




month month 


(c) seasonality (d) precision 

Figure 2: Filtered (solid line) and smoothed (dashed line) latent states means for application using 
unemployment data. 


It can be seen in Figures|^and|^that the method generated satisfactory results, since both estimated 
means (the filtered ones E[^t\Dt],) and one-step-ahead predictive distribution means {E[yt\Dt-i]) follow 
the behavior of the real data series, as illustrated by Figures and respectively . Also note that the 
estimated 95% HPD credibility intervals for the one-step-ahead predictive distributions, represented by 
the dashed red lines in Figure]^ are well concentrated and contain the true value of the observations in 
all considered instances. The point and interval estimates for the predictive distributions considered in 
the last six instants can be seen in Table [H 

To illustrate the importance of dynamic modelling for the precision parameter model, we completed 
this application by comparing its results with those obtained using a similar model in which we assumed 
that (/>( = cj), yt, taking null precision evolution errors in matrix Wt. Figure compares the interval 
estimates for the one-step-ahead predictive distribution obtained considering both models. Note that 


14 












month 

Figure 3: Filtered estimated observational means (solid line) based on latent states posterior means 
obtained for application using unemployment data. The gray points represent the true observations. 


Month 

yt 

Mean 

Mode 

IC 

95 % 

2011.07 

0.060 

0.062 

0.061 

[0.054 

, 0.070] 

2011.08 

0.060 

0.059 

0.058 

[0.052 

, 0.066] 

2011.09 

0.060 

0.056 

0.056 

[0.050 

, 0.063] 

2011.10 

0.058 

0.055 

0.055 

[0.048 

, 0.061] 

2011.11 

0.052 

0.054 

0.055 

[0.048 

, 0.061] 

2011.12 

0.047 

0.054 

0.054 

[0.048 

, 0.060] 


Table 1: Point and interval estimates of unemployment rates for the period from July 2011 to December 
2011, based on the predictive distributions p{yt\Dt-i). 

intervals based on a model with (j) fixed in time (represented by the shaded area in the graph) are less 
concentrated, indicating that there was a gain with respect to accuracy of the predictive distributions in 
this case, in which we considered the dynamic modelling of the precision structure. 

4.3 Expenditure shares in the U.K. economy 

As a second illustration of the proposed methodology, we apply the new method to a real data set 
concerning expenditures in the UK economy for the period 1955 to 2012. The quarterly data, obtained 
from the U.K. Office of National Statistics web page (http://www.statistics.gov.uk/), deal with the 
costs of the economy, whose composition is described by consumption (c), investment (i), government 
expenditure (g) and export (e) shares of U.K. gross final expenditure. 
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2002.03 2004.03 2006.03 2008.03 2010.03 

month 

Figure 4: One-step-ahead predictive intervals from September 2003 to December 2011, based on the 
predictive distributions p{yt\Dt-i) of alternative models: the shaded region represents the 95% HPD 
predictive credibility intervals based on a model with (j) fixed in time, whereas the solid lines represent 
the predictive distribution credibility intervals based on a model with a dynamic precision parameter 
structure. The gray points are the observed unemployment rates. 


Despite the compositional nature of the data, in order to use the class of models discussed in this 
article, which includes only univariate observational distributions in the exponential family, we analysed 
each of the rate series separately through a generalized dynamic model whose observations follow the 
beta distributions, and for which we assumed different mean and precision structures. We denote the 
proposed models by the mnemonic Var(Z) and Pol(Z), meaning a vectorial autorregressive component and 
a polynomial trend, respectively, where I is the order of the correspondent model. This models were 
combined to model the transformed observational mean and the transformed observational precision in 
different forms. For each case, as in the previous application, we assumed that the latent variables 
associated with means and precisions model evolve in time independently, taking the matrices F’j, Gt, 
Wt and Co as block diagonal matrices. Under this hypothesis, three different structures were considered 
for the class of models represented by (§ and Q: 

• Var(2)Pol(0) - Second order VAR model for the transformed observational mean and constant for 
the transformed observational precision: 

For the means structure, we assumed that each series can be explained by all the other series, taking 
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two lags in time, assuming a second-order VAR model. For the precision structure, we assumed 
that each series has a constant accuracy in time, taking 

■^1* “ ( Xc^t-2, a^g.t-2) Xe,t-2 ) > 

Gu = / 9 , Wu = 0, and ^ 2 * = 62 * = 1, TVa* = 0, Vt G {1,..., T}, 

where Xc,., xg^,, x-^ , Xe,,, represent, respectively, the rates of consumption, government 
expenditure, investment and exports in previous instants. 

• Var(2)Pol(l) - Transformed observational mean modelled by a second order VAR and precision 
with a first order dynamic structure: 

As in the previous case, we assumed means explained by a second-order VAR model, but in this 
case we allowed the precisions to vary in time according to a first order polynomial model taking 

■^1* “ ( Xc,t-l-i Xg^t-l, Xc,t- 2 , Xg^t- 2 , Xjj._ 2 , Xe,t -2 ) 5 


Gi*=J 9 , Wu = 0, and ^ 2 * = G 2 i = 1, W 2 t ^ 0, Vt G {1,..., T}, 


where, again, Xc,., xg^,, Xj , Xe,., represent, respectively, the rates of consumption, government 
expenditure, investment and exports in previous instants. 

• Pol(2)Pol(l) - Polynomial models for both mean and precision structures: 

For the means we assumed a second-order model in which we considered level and trend for each 
of the series and a first-order structure for the precisions, taking 



Wit^Q and 


F2t = G2t = l, W2ty^0 Vt G {!,...,T}. 


As in the previous application, we chose to specify the covariance matrices through the use of multiple 
discount factors, assuming block diagonal matrices, whose blocks are associated with the respective 
components (level and trend in the case of second-order model and level in the order 1 model) in 
polynomial models. More specifically, considering, for example, the Pol(2)Pol(l) structure, we used 
a block diagonal discount matrix of the form D = blockdiag{5~J/^ J 2 , <5^ where is the discount 

factor associated with mean level and trend components and is the discount factor associated with 
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precision level components, substituting the expression of Rt in the evolution step of the algorithm for 


the form Rt = DGtCt-iG'tD, as discussed in Chapter 6 of 

For each of the rate series and for each of the dynamic structures assumed, different combinations 
of discount factors values were used, so we selected the one that provided the best data fit according 
to the mean squared error (MSE) based on one-step-ahead forecasting, the joint log-likelihood (LL) and 
the log-observed predictive density (LPD) of each series, excluding the first 31 observations (taken as 
learning sample). For this application, different combinations of values 0.90, 0.95 and 0.98 were taken 
for the discount factors and, for all assumed dynamic structures, models with smaller values, namely 
5 ^ 1 ,It = 6^,1 = 0.90, outperformed. Table [^reports adjustment measures for the different dynamic models. 
It can be seen that, according to the criteria used, the model that supposes a second-order Var structure 
for the mean and a first order structure for the precision performs better with lower MSE and values and 
higher LL and LPD values, which makes sense since the Var structure capturing the relationship between 
the different rate series and allows the precision model structure to vary in time, giving greater flexibility 
to the model. 

consumption investment 


West and Harrison 


(1997 


Model 

MSE 

LL 

LPD 

MSE 

LL 

LPD 

VAR(2)Pol(0) 

0.400e-4 

745.507 

724.151 

0.525e-4 

714.757 

694.131 

VAR(2)Pol(l) 

0.400e-4 

759.776 

725.504 

0.505e-4 

730.774 

698.827 

Pol(2)Pol(l) 

0.838e-4 

706.711 

653.134 

1.083e-4 

691.014 

636.617 



government expenditure 


export 


Model 

MSE 

LL 

LPD 

MSE 

LL 

LPD 

Var(2)Pol(0) 

0.105e-4 

881.688 

857.297 

0.495e-4 

726.010 

702.890 

Var(2)Pol(l) 

O.lOle-4 

899.035 

862.609 

0.463e-4 

745.200 

710.554 

Pol(2)Pol(l) 

0.453e-4 

761.649 

707.335 

1.127e-4 

676.193 

621.606 


Table 2: MSE based on one-step-ahead forecasting, joint log-likelihood (LL) and log-observed predictive 
density (LPD) based on consumption, investment, government expenditure and export rates for the 
period from 1963.2 to 2012.3, obtained from different models. 
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Once the Var(2)Pol(l) model was selected, we estimated the parameters and the one-step-ahead 
predictive distributions for the four rate series during the period 1963.2 to 2012.3, as represented by 
Figuresand 1^ respectively. In Figurej^it is possible to observe the point estimates for the observational 
means of each series (the filtered ones E[^t\Dt]. Note that for all analysed series, the estimated means 
closely parallel the behaviour of the data series. Similar behaviour can also be observed for the estimated 
predictive mean’s {E[yt\Dt-i]), shown in Figure]^ It can also be seen that the estimated 95% HPD 
credibility intervals for the one-step-ahead predictive distributions are well concentrated, containing the 
true observation values in most cases. Point and interval predictive estimates for investment rates for 
some considered instants can be seen in Table |3l 


-NP 



quarter 



quarter 


(a) consumption (c) 


(b) investment (i) 


o 
o 

in - 
o 

in 

^ -I 
o 

I-1-1-1-1-1-1-1-1-1-1- 

1955.1 1965.1 1975.1 1985.1 1995.1 2005.1 



quarter 



quarter 


(c) government expenditure (g) (d) export (e) 

Figure 5: Filtered estimated observational mean (solid line) based on latent states posterior means for 
the quarterly shares of U.K. gross final expenditure for the period from 1955.1 to 2012.3. The points 
represent the true data set. 


The smoothed posterior mean estimates {E[iJ.t\DT]) for all data series are represented in Figure 
Although we treated each time series separately the estimates obtained are consistent, in the sense that, 
at each instant, the sum of the estimated means are approximately one. This behaviour indicates that. 
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quarter 


quarter 


(a) consumption (c) 


(b) investment (i) 





quarter 



1955.1 1965.1 1975.1 1985.1 1995.1 2005.1 


quarter 


(c) government expenditure (g) (d) export (e) 

Figure 6: One-step-ahead prediction for the quarterly shares of U.K. gross final expenditure for the period 
from 1963.2 to 2012.3. The solid line represents the predictive distribution means {E[yt\Dt-i\) and the 
shaded region the 95% HPD predictive credibility intervals. The points represent the true data set in 
each case. 


despite the simplicity of the model used in this application, the behaviour of the series is well captured 
by the proposed model. 


A subsets of the data set used in this application have already been analyzed by Mills (2010). Under 


a classical point of view, Mills (2010) estimated an order 2 VAR model, using a multivariate normal 


distribution to model a transformation of the original data as 

log(^), log(l) and log(0, 


(18) 


where c, i, g and e represent consumption, investment, government expenditure and export rates, 
respectively. 

In order to ascertain whether there is any advantage in analysing the data in their original scale we 
reanalysed these data set transforming them as proposed by Mills (2010) (according to equations ([T^), 
replacing the observational beta distributions with univariate normal distributions for each series. Again 
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investment 


quarter (t) 

Vt 

mean 

mode 

ICg5% 

2011.1 

0.106 

0.117 

0.117 

[0.105 , 

0.131] 

2011.2 

0.112 

0.112 

0.111 

[0.098 , 

0.126] 

2011.3 

0.116 

0.113 

0.113 

[0.101 , 

0.126] 

2011.4 

0.110 

0.116 

0.116 

[0.103 , 

0.130] 

2012.1 

0.107 

0.114 

0.114 

[0.102 , 

0.127] 

2012.2 

0.110 

0.110 

0.110 

[0.097 , 

0.124] 

2012.2 

0.109 

0.112 

0.111 

[0.100 , 

0.125] 


Table 3: Point and interval estimates of the quarterly investment (i) shares of U.K. gross final expenditure 
based on the predictive distributions p{yt\Dt-i). 


CD 



1955.1 1965.1 1975.1 1985.1 1995.1 2005.1 

quarter 


Figure 7: Smoothed posterior estimates for the observational means for quarterly consumption (c), 
investment (i), government expenditure (g) and export (e) shares concerning expenditure in the UK 
economy over the period 1955.1 to 2012.3. Gray continuous lines represent true observations. 

we chose to model each series separately using analogous structures to those adopted in the beta case and 
assuming different discount factors for cases that include dynamics for the latent variables. According to 
the model comparison criteria used in this article, the best htted standard model was the one in which we 
assumed a second-order VAR model for the observational means and a first-order model for the precisions, 
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VAR(2)Pol(l) beta model VAR(2)Pol(l) normal model 



MSE 

LL 

LPD 

MSE 

LL 

LPD 

consumption 

0.400e-4 

759.776 

725.504 

0.061 

472.727 

474.535 

investment 

0.505e-4 

730.774 

698.827 

0.112 

591.338 

374.790 

gov. expenditure 

O.lOle-4 

899.035 

862.609 

0.101 

591.906 

440.599 


Table 4: Mean square error (MSE) based on one-step-ahead forecasting, joint log-likelihood (LL) and 
log-observed predictive density (LPD) based on consumption, investment, government expenditure and 
export rates for the period from 1963.2 to 2012.3, obtained from different models. 

assuming a discount factor equal to 0.90 to specify the error evolution covariance matrices of the latent 
variables associated with precision structure. 

To compare the performance of the best beta model with the corresponding normal one (both with 
Var(2)Pol(l)), we recalculated the normal model fit measures correcting each measure through the 
Jacobian of the transformation, in order to obtain adjustment measures in a same scale. The results 
for the fit measures for the different models can be seen in Table Its possible to see that all the criteria 
that take into account one-step-ahead predictive distribution estimates of each of the series indicate a 
better performance of the beta model. Indeed, for the three considered series, the beta model had lower 
MSE and higher LL and LPD for all cases, giving evidence that the modelling of the data in their original 
scale has advantages regarding the predictive ability of the model. 


5 Conclusions and Additional Comments 


In this paper we propose a method for estimation and prediction of dynamic models whose observations 
follow distributions of the two-parameter exponential family. The estimation in the proposed partially 
specified model class, represented by equations @ and is based on a extension of the conjugate 


updating algorithm of West et al. (1985). The main idea of this new method is to explore properties of 


conjugacy in the exponential family and linear Bayes estimation, allowing the quick updating of both 
mean and precision model parameters through analytical strategies, avoiding computationally intensive 
methods such as those based on Monte Carlo estimation. 

Our algorithm stands out mainly for two reasons: first it treats a very general class of models with 
observations in the exponential family, which allows modelling data in their original scale, such as in 
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McCullagh and Nelder (1989)’ MLG. Second, the introduction of a second link function in the model 


allows treatment of overdispersion and heteroscedasticity in data, and allows the precision structure of 
the model to be dynamically treated, efficiently capturing the data behaviour even through the use of 
partially specified models. 


Simulated studies presented by Souza (2013), assuming different observational models in the two- 
parameter exponential family, show that the proposed method generated satisfactory results both as 
regards obtaining point and interval estimates for the parameters, as in steps-ahead forecasting. The 
applications to real data presented in Section of this paper also illustrate the good performance of the 
proposed algorithm and demonstrate the relevance of modelling data in their original scale. 

Although use of MMG has been shown to be a good alternative to reduce the dimensionality of the 
system treated in Section 3.2 we intend to study other alternatives for reducing the system ([^. Also 
with respect to the use of the generalized method of moments, we intend to study the choice of weights 
matrix rik with the aim of checking whether there is any gain in quality of estimates by introducing an 


iterative choice of weights matrix flk, as discussed in Newey (1993) and Hamilton (1994). 

As the main extension of this work we intend to extend the conjugate updating algorithm in order 
to treat classes of multi-parameter and multivariate models, such as models whose observations follow 
Dirichlet or multinomial distributions, the parameters of which can be explained by different link 
functions. 
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